%!TEX root = ../book.tex
\chapter{Manipulation}\label{chap:manipulation}

While grasping (\cref{chap:grasping}) is generally concerned with connecting an object to a robot's kinematic chain, the act of grasping itself is usually only a small part of a small part of the tasks involved in physically dealing with objects.

\begin{mdframed}
Think of all the possible ways in which you interact with objects on a daily basis. Identify which part of these can be classified as ``grasping'' and what is actually ``manipulation''? How many times do you need to plan for a complex sequence of actions (e.g., making coffee in the morning)?
\end{mdframed}

Oftentimes, the intention of the grasping action is to change the pose of this object in a precise, repeatable, and purposeful way.
For example, cutlery and dishes need to be in well-specified areas and aligned with each other when setting a table, merchandise needs to be neatly stacked in a shelf, and machine parts need to be assembled with each other according to a specific order.
These activities are more generally known as \textsl{manipulation}\index{Manipulation}.


The goals of this chapter are to introduce:

\begin{itemize}
\item the difference between grasping and manipulation,
\item algorithms for choosing the right grasp,
\item canonical manipulation tasks such as pick-and-place and assembly.
\end{itemize}

\section{Non-Prehensile Manipulation}
Manipulation can be thought of as a superset of grasping, that includes additional capabilities that are typically referred to as \textsl{non-prehensile}\index{Non-prehensile Manipulation}---i.e., anything but grasping. Indeed, objects can be pushed, poked, tossed, flipped, inserted, screwed-in, turned, twisted, and much more!
However, discussing all the possible ways objects might be manipulated and the many different contexts such actions would be required---which might dramatically change the approach a robot would need to chose---is well beyond the scope of this book and still matter of active research.

Many manipulation problems can be cast into a sequence of pick-and-place problems in which the possible grasp choices are appropriately constrained. For example, an object can be turned or flipped by planning a sequence of pick-and-place movements that each turn the object by a certain degree. Similarly, using two robotic arms, with one grasping an object out of the hand of the other, will allow a robot system to change an object's pose almost arbitrarily. (Which poses an object will be able to reach will depend on the object's exact geometry, the kinematics of the robotic arms, and constraints in the workspace.) So-called \textsl{in-hand manipulation}\index{In-hand manipulation} is still an active area of research as repeatedly picking and placing an object and hand-overs between different arms is considered to be too slow and otherwise impractical for many application areas.

\section{Choosing the right grasp}
While we have so far only considered the mechanics of grasping (\cref{chap:grasping}), choosing an appropriate pose for grasping an object in a specific way is an algorithmic problem.

Finding a good grasp that fully constrains an object against all possible external forces and torques, i.e. a grasp that lies within the ``grasping wrench space'' detailed in \cref{sec:grasping_friction} may be too restrictive and, oftentimes, unnecessarily so.
For example, it might be sufficient to find a grasp that constrains an object simply against gravity.
Other applications might require the grasp to constrain an object's movement also again lateral forces that happen due to acceleration.
In practice, these considerations usually lead to simple application-specific heuristics. For example, in a warehouse picking tasks \cite{correll2016analysis}, the problem can be constrained to have the robot grasp only objects that are suitable to be retrieved with a simple suction cup. Finding a good grasp is then reduced to finding a flat surface close to the object's perceived center of gravity.
When considering household tasks, such has handling and placing dishes, using silverware to pickup food, or holding a pitcher, we are often interested in very specific grasps that support the intended manipulation that follows.

Theoretically speaking, grasps such as picking up an object or opening a door by turning its knob are task-specific wrench spaces. We can then say that the grasp is ``good'', when the task wrench space is a subset of the grasping wrench space, and will fail otherwise. We can also look at the ratio between the forces actually applied to the object and the minimum force needed to perform a desired wrench. If this ratio is high---for example, when the robot grasps an object far from its center of gravity or has to squeeze an object heavily to prevent it from slipping---this grasp is not as good as one where the ratio is low, since in this latter case all of the force the robot is exerting is being efficiently utilized for the intended purposes.
Unfortunately, it is usually not possible to find close-form expressions for the grasping wrench space. Rather, one can sample the space of suitable force vectors---e.g., by picking a couple of forces that are on the boundary of the cone's base, and calculate the convex hull over the resulting wrenches.

\subsection{Finding good grasps for simple grippers}

Finding good grasps for simple grippers---i.e., those with only one or at most two DoFs, see \cref{sec:simplegrasp}---reduces the problem to finding geometries on the object that are suitable to place the gripper's jaws: that is, what we need is to find two parallel faces that are reasonably flat and at a distance that is lower than the gripper's maximum aperture.
\begin{figure}
    % \includegraphics[width=\columnwidth]{figs/graspingpointcloud}
    \def\svgwidth{\textwidth}
    \import{./figs/}{graspingpointcloud.pdf_tex}
    \caption{a) Random objects on a table, b) measurements from a laser scanner on the objects' surface, c) removal of table plane, d) connected components after segmentation, e) removal of connected components based on size, f) calculation of principal axes, g) evaluation of possible grasps based on collisions, h) physically attempting grasp.\label{fig:graspalgorithm}}
\end{figure}
In practice, an object might be perceived by a three-dimensional perception device such as a stereo camera or a laser scanner, which provides only one perspective of an object and may introduce noise and uncertainty (\cref{chap:uncertainty}). A typical grasping pipeline using such a device is shown in \cref{fig:graspalgorithm}, and proceeds as follows:
\begin{enumerate}
\item Acquisition: Obtain a ``point cloud'' or ``depth image'' of the objects of interest (\cref{fig:graspalgorithm}, b).
\item Pre-processing: Remove table plane or other points that are either too close or too far from the sensor (\cref{fig:graspalgorithm}, c).
\item Segmentation: Cluster points that are close enough, e.g., to identify individual objects (\cref{fig:graspalgorithm}, d).
\item Filtering: Filter clusters by size, geometry or other features, to down-select objects of interest (\cref{fig:graspalgorithm}, e).
\item Planning: Compute center-of-mass and principal axes of relevant clusters (\cref{fig:graspalgorithm}, f).
\item Collision-checking: Generate possible grasps and check for collisions with point clouds (\cref{fig:graspalgorithm}, g).
\item Execution: Physically test a grasp by monitoring jaw distances, as well as forces and torques at the wrist (\cref{fig:graspalgorithm}, h).
\end{enumerate}

Some of these steps might not be necessary for all grasps, and some of them might become very complicated for some task. For example, pre-processing is often used to remove known quantities such as a table surface from the data, but it might become non-trivial when e.g. removing the edges of a bin or operating with a container with an arbitrary size.

\textsl{Segmentation} is the most critical step and requires some previous knowledge about the objects to grasp such as their size or the geometry of features thereon. In \cref{fig:graspalgorithm}, clustering points based on their distance is sufficient, e.g. using the DBSCAN algorithm \cite{ester1996density}, but requires an assumption about object size in order to select a suitable threshold. Other segmentation algorithms might use surface normals, a combination of point cloud and image data such as color or patterns, or employ deep learning.

\textsl{Filtering} the resulting clusters to identify objects of interest can be as simple as rejecting those that are too small (as shown in \cref{fig:graspalgorithm}, \textsl{e}), but might also involve matching the points to a 3D model of a desired object or involving image data, using e.g. ICP and RANSAC from \cref{chap:mapping}.

A simple approach to \textsl{plan} for possible grasps is to calculate the center-of-mass as well as the principal axes of an object using principal component analysis (\cref{sec:appendix_PCA}). Other approaches might again require matching the existing points to a 3D model of the object to identify specific grasp points (such as the handle of a cup), or rely on image features to do so.

After planning all, or some, possible grasps, they need to be checked for \textsl{feasibility}. While a collision with a point in the point-cloud might rule out a grasp, local search is sometimes being used to find a collision-free variant, for example by (virtually) moving the gripper up and down as well as along the principal axes. In other applications, for example bin picking, some collisions might be ignored with the expectation that the gripper will push other objects out of its way.

Even though a grasp might look robust in a point cloud representation, it might not be effective when physically executing it. Possible failures are collisions with objects, insufficient friction with the object, or an object moving before the gripper is fully closed. For this reason, it is important to already close the gripper as much as possible before approaching the object, increasing the requirements for accurate perception.

With the recent advent of (deep) machine learning and the ability of neural networks (\cref{chap:ann}) to approximate complex functions, it is also possible to replace parts, or all of, the algorithmic steps shown in \cref{fig:graspalgorithm} with a convolutional neural network trained via deep learning. While data intensive, such an approach can seamlessly merge image and depth data and adapt to application-specific data better than a hand-coded algorithm can.

\subsection{Finding good grasps for multi-fingered hands}

The simple grasping pipeline described above is computationally expensive as there usually exist many possible grasp candidates, and each of them need to be checked for collisions. This problem becomes even more relevant when considering grippers with articulated fingers. This can be overcome by considering only a predefined set of grasps, e.g. two and three finger pinches for small objects and full-hand encompassing grasps for larger objects.

A suitable method to search the full space of possible grasps with an articulated hand is to use random sampling, i.e. moving the end-effector to random poses, closing its fingers around the object, and seeing what happens when generating wrenches that fulfill the task's requirements.
``seeing what happens'' is usually performed first in simulation, and it requires collision checking and dynamic simulation. Dynamic simulation applies Newtonian mechanics to an object (i.e., forces lead to acceleration of a body) and moves the object at very small time-steps. While this can be done using the connected components identified in the point cloud alone and assuming reasonable parameters for friction and contact points, point cloud data can also be augmented by object models to simulate whether a grasp has a high likelihood to be successful. Here, there is a trade-off in exploring the space of possible grasps in simulation and actually trying grasps with the real hardware.

\section{Pick and place}

One of the most basic manipulation problems is known as ``pick and place''. It involves grasping an object, transporting it, and placing it.
However, what looks like a simple action is actually a sequence of individual tasks that can fail for multiple reasons. Pick and place consists of the following steps (\cref{fig:pick-and-place}):

\begin{enumerate}
\item Approaching the object;
\item Grasping the object;
\item Lifting the object;
\item Moving the object to an intermediate pose;
\item Placing the object;
\item Releasing the object;
\end{enumerate}

\begin{figure}
	\centering
    \def\svgwidth{0.8\textwidth}
    \import{./figs/}{pick-and-place.pdf_tex}
    \caption{Various stages of pick-and-place (or grasping) from approach to release. Problems during an early step, such as during placing shown here, may lead to failure of later stages of the algorithm.\label{fig:pick-and-place}}
\end{figure}

Each of these actions might not work as intended, requiring the robot to abort and restart the process. For example, what seems like a reliable grasp may turn out not to be suitable for actually lifting the object. Or, a suitable path (\cref{chap:pathplanning}) toward the desired approach pose may not be found and it may be necessary to find another suitable approach pose first. This problem is known as \textsl{Task and Motion Planning} or TAMP\index{Task and motion planning}\index{TAMP}. Once a suitable approach pose has been found, placing the object will require to monitor forces and torques to ensure a gentle placement. Finally, releasing the object might require to verify the intended pose.

As failure can occur anywhere in the above pipeline and manually encoding all possible state transitions will quickly get out of hand,  Behavior Trees (\cref{sec:behaviortrees}) have emerged as a powerful tool to encode complex sensor-based action sequences. A sample behavior tree for a picking task is shown in \cref{fig:BTpick}.

\section{Peg-in-hole problems}\label{sec:peginhole}

A canonical manipulation problem and a special case of pick-and-place is the \textsl{peg-in-hole}\index{Peg-in-hole} problem and variations thereof---including hole-on-peg problems, which generalize insertion and assembly operations. Peg-in-hole requires repeatable grasping of an object and using force-torque based search motions to find the hole.
%
Typically, insertion patterns consist of tilting motions and spiral-shaped search motions \cite{watson2020autonomous}. Both approaches have advantages and drawbacks. Tilt insertion tends to work better for larger objects (excess of $1cm$ diameter) for peg-in-hole tasks.

\begin{figure}
	\centering
    \def\svgwidth{0.65\textwidth}
    \import{./figs/}{peg-in-hole.pdf_tex}
    \caption{Tilt-based peg-in-hole insertion. Arrows indicate the directions of gripper translation and rotation. Dashed lines indicate motion that results from compliant grasping. \label{fig:tilt-insertion}}
\end{figure}

Tilt insertion is detailed in \cref{fig:tilt-insertion} and proceeds as follows:
\textsl{a}) given a hole pose, the part is held vertically above the hole by a preset distance;
\textsl{b}) the gripper is then tilted about its local $y-$axis and translated horizontally in the global frame in the direction corresponding to the hand's local $x-$axis;
\textsl{c}) this offset places the lowest part of the bottom edge of the part directly above the estimated center of the hole. Then the hand is translated downwards in the global frame until the part makes contact with the hole;
\textsl{d}) naturally, the shaft cannot sink deeply into the hole when tilted this way; rather, the curvature of the hole meets the circumferential surface of the shaft and gathers it towards the center of the hole, exploiting compliance in the gripper;
\textsl{e}) after being centered, the shaft is tilted back up a few degrees beyond vertical;
\textsl{f}) finally, the shaft is returned to true vertical and pushed down into the hole until a preset reaction force (commensurate with the hole fit) is met. If the $z-$component of the hand's final pose is near to the expected value, the insertion is considered successful.

\begin{figure}
	\centering
    \def\svgwidth{0.65\textwidth}
    \import{./figs/}{hole-on-peg.pdf_tex}
    \caption{Spiral-based hole-on-peg assembly. Arrows indicate the directions of gripper translation and rotation. The spiral motion is performed in the horizontal plane and is accompanied by downward pressure. It either leads to missing the peg (\textsl{d}) or finding the peg (\textsl{e}) and complete assembly (\textsl{f}). \label{fig:spiral-insertion}}
\end{figure}

Spiral insertion is best suited for hole-on-peg operations, and peg-in-hole operations with a peg diameter less than $1cm$. It is described for hole-on-peg assemblies in \cref{fig:spiral-insertion}. The algorithm proceeds as follows:
\textsl{a}) given a shaft end pose, the grasped part---e.g. a gear with a center hole---is held vertically above the shaft by a preset distance;
\textsl{b}) the hand then moves down along the axis of insertion until a reaction force threshold is reached;
\textsl{c}) the robot then performs a spiraling motion defined in polar coordinates with its origin at the point of contact; the algorithm also probes the contact surface as to whether the reaction force against the wrist in $z-$direction falls below a threshold; it continues in this fashion until one of several conditions are met:
\textsl{d}) the gripper pose exceeds a threshold below which the intended hole pose cannot lie, or
\textsl{e}) lateral torques at the wrist exceed a threshold indicating the gear has slid onto the shaft and cannot translate laterally.
The vertical probing steps also serve to sink the part into place before attempting a lateral move that might sabotage what would otherwise be a successful insertion. Once the torque threshold is met, the hand pushes down until a preset reaction force (commensurate with the hole's tolerance) is met.
\textsl{f}) if the $z-$component of the hand's final pose is near to the expected value, the assembly is considered a success.

Implementing peg-in-hole and hole-on-peg insertion also has a large number of possible failure modes, making a behavior tree (BT) architecture most suitable.
However, force and torque-based controls requires operating at a higher bandwidth than a typical BT architecture would support (in the order of hundred of Hz and more). How to implement these closed-loop control loops is strongly dependent on the actual hardware.
For example, some robot arms do not make raw F/T values available at a rate that is required for real-time control, but provide built-in functions to move until certain force limits are met. This allows the robot manufacturer to run real-time controllers that ensure safety, which would be more difficult to accomplish when providing low-level control to the user.

\section*{Take-home lessons}
\begin{itemize}
\item Performing simple pick and place tasks is now possible thanks to high-resolution in-hand sensing and fast enough computation that can sift through large amounts of point cloud data in real-time. 
\item Planning and executing a grasp is a complex problem that encompasses topics from many of the previous chapters, ranging from identifying an object, localizing it and computing the inverse kinematics to reach it.
\item Manipulation extends these techniques from the robot itself to the objects the robot deals with and how they relate to each other, and remains an open field of research. 
\item Seemingly simple manipulation tasks such as pick-and-place or peg-in-hole assembly require both high-level planning and force-torque real-time control, posing a combined task and motion planning problem. 
\end{itemize}

\section*{Exercises}
\begin{enumerate}
\item Write code to generate rectangles with random dimensions and orientations. Rectangles can overlap. Use a point-in-polygon test to simulate random point samples on their surface, simulating a top-down view with a depth sensor.
\begin{enumerate}
\item Implement a segmentation routine that clusters objects based on a minimum distance.
\item Implement a filter that rejects connected components based on size. For which kind of objects does this work well and where does this method fail?
\item Implement a filter that rejects connected components that do not have rectangular shape. Are you able to specify a filter that works independent of the object size?
\item Apply principal component analysis to compute the principal axes of the rectangle and compare with ground truth. How does the number of samples affect the accuracy of your estimate?
\end{enumerate}
\item Use a function of the kind $u(x-i)+rand(j)$ with $u(x)$ the unit step function, $rand()$ uniformly distributed random noise, and $i,j$ suitable parameters to simulate a noisy depth-image of a cube with width $i$. Use the nearest neighbor of each point to compute its normals and a suitable clustering algorithm to identify the cube. How do $i$ and $j$ affect the accuracy of your estimate?
\item Think about simulating peg-in-hole assembly in a robotic simulator. What are the problems with using a simulation environment when simulating tight assembly problems?
\item Perform an internet search for ``open source'' robotic assembly problems and re-create them in your laboratory. Implement a spiral and tilt-based assembly controller. 
\end{enumerate}
